Skip to content

SeaDatabricksClient: Add Metadata Commands #593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 119 commits into from
Jun 26, 2025

Conversation

varun-edachali-dbx
Copy link
Collaborator

@varun-edachali-dbx varun-edachali-dbx commented Jun 11, 2025

What type of PR is this?

  • Feature

Description

Add metadata command implementations for the SeaDatabricksClient (execution phase) - get_catalogs, get_schemas, get_tables and get_columns.

How is this tested?

The coverage of the functionality added (by test_filters.py and the new tests in test_sea_backend.py) are as below:

Module Statements Missing Coverage Notes
filters.py 33 1 97% Line 21: from databricks.sql.result_set import ResultSet, SeaResultSet (TYPE_CHECKING import)
sea/backend.py (metadata methods) 121 0 100% Fully covered

Related Tickets & Documents

https://docs.google.com/document/d/1Y-eXLhNqqhrMVGnOlG8sdFrCxBTN1GdQvuKG4IfHmo0/edit?usp=sharing

Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Signed-off-by: varun-edachali-dbx <[email protected]>
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@databricks databricks deleted a comment from github-actions bot Jun 26, 2025
@databricks databricks deleted a comment from github-actions bot Jun 26, 2025
@databricks databricks deleted a comment from github-actions bot Jun 26, 2025
@databricks databricks deleted a comment from github-actions bot Jun 26, 2025
Signed-off-by: varun-edachali-dbx <[email protected]>
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link
Contributor

@jayantsing-db jayantsing-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Added some questions.

session_id=session_id,
max_rows=max_rows,
max_bytes=max_bytes,
lz4_compression=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not using compression for metadata?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a side effect of setting use_cloud_fetch=False: compression is not supported for INLINE + JSON in SEA.

use_cloud_fetch=False,
parameters=[],
async_op=False,
enforce_embedded_schema_correctness=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a thrift-specific param?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it is a param passed to Cursor's execute method, so I don't see a way to not include it in our execute_command, if it were passed as a connection level property then we may have been able to access it that way. Should I try to avoid passing it to the SEA backend?

) -> "ResultSet":
"""Get columns by executing 'SHOW COLUMNS IN CATALOG catalog [SCHEMA LIKE pattern] [TABLE LIKE pattern] [LIKE pattern]'."""
if not catalog_name:
raise ValueError("Catalog name is required for get_columns")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so for the caller (client) code, it will appear as ValueError. is it okay? should we throw something else as per spec? how does the other backend throws error in first-class APIs like these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch, I changed it to raise DatabaseError which seems the most appropriate, I also changed a few other methods to raise ProgrammingError instead of ValueError.

# Create a new ResultData object with filtered data
from databricks.sql.backend.sea.models.base import ResultData

result_data = ResultData(data=filtered_rows, external_links=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so when you manually set the result data in result set, what will happen in the fetch phase? will it use the execute-response to re-fetch the data? are there any other examples in the codebase, where we manually set the result data or is this the first instance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we construct the ResultData by setting the data field and not external_links, the SeaResultSet is effectively instantiated as it would be during INLINE + JSON_ARRAY querying.

We also set it during the instantiation of the ResultSet in the SEA backend, as seen here.

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@varun-edachali-dbx varun-edachali-dbx merged commit e380654 into sea-migration Jun 26, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants